Semi-supervised clustering via multi-level random walk

نویسندگان

  • Ping He
  • Xiao-hua Xu
  • Kongfa Hu
  • Ling Chen
چکیده

A key issue of semi-supervised clustering is how to utilize the limited but informative pairwise constraints. In this paper, we propose a new graph-based constrained clustering algorithm, named SCRAWL. It is composed of two random walks with different granularities. In the lower-level random walk, SCRAWL partitions the vertices (i.e., data points) into constrained and unconstrained ones, according to whether they are in the pairwise constraints. For every constrained vertex, its influence range, or the degrees of influence it exerts on the unconstrained vertices, is encapsulated in an intermediate structure called component. The edge set between each pair of components determines the affecting scope of the pairwise constraints. In the higher-level random walk, SCRAWL enforces the pairwise constraints on the components, so that the constraint influence can be propagated to the unconstrained edges. At last, we combine the cluster membership of all the components to obtain the cluster assignment for each vertex. The promising experimental results on both synthetic and real-world data sets demonstrate the effectiveness of our method. & 2013 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximum Consistency Preferential Random Walks

Random walk plays a significant role in computer science. The popular PageRank algorithm uses random walk. Personalized random walks force random walk to “personalized views” of the graph according to users’ preferences. In this paper, we show the close relations between different preferential random walks and label propagation methods used in semi-supervised learning. We further present a maxi...

متن کامل

Semi-supervised Clustering of Medical Text

Semi-supervised clustering is an attractive alternative for traditional (unsupervised) clustering in targeted applications. By using the information of a small annotated dataset, semi-supervised clustering can produce clusters that are customized to the application domain. In this paper, we present a semi-supervised clustering technique based on a multi-objective evolutionary algorithm (NSGA-II...

متن کامل

Clustering Heterogeneous Data with Mutual Semi-supervision

We propose a new methodology for clustering data comprising multiple domains or parts, in such a way that the separate domains mutually supervise each other within a semi-supervised learning framework. Unlike existing uses of semi-supervised learning, our methodology does not assume the presence of labels from part of the data, but rather, each of the different domains of the data separately un...

متن کامل

A Novel Multi Label Learning Based on Clustering Integrated Ensemble Classifier Chain Micro Prediction Models

Most of the real world problems are concerned with assignment of multiple target labels to the instances. The proposed model aims to increase the accuracy by incorporating supervised and semi supervised learning. K Means clustering is employed which creates K clusters based on the initialization of cluster centroids. Datasets are clustered based on its distribution in the Euclidean space. Clust...

متن کامل

Semi-supervised Learning of a Markovian Metric

The role of a distance metric in many supervised and semi-supervised learning applications is central in the success of clustering algorithms. Since existing metrics like Euclidean do not necessarily reflect the true structure (clusters or manifolds) in the data, it becomes imperative that an appropriate metric be somehow learned from training or labeled data. Metric learning has been a relativ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition

دوره 47  شماره 

صفحات  -

تاریخ انتشار 2014